Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] 554 Remove skimage dependency in get_largest_connect_component #579

Conversation

YuanTingHsieh
Copy link
Collaborator

Fixes #554 .

Description

Remove skimage measure function and implement our own.

Status

Work in progress

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or new feature that would cause existing functionality to change)
  • New tests added to cover the changes
  • Docstrings/Documentation updated

@YuanTingHsieh
Copy link
Collaborator Author

/black

@YuanTingHsieh YuanTingHsieh changed the title [WIP] Remove skimage dependency in get_largest_connect_component [WIP] 554 Remove skimage dependency in get_largest_connect_component Jun 16, 2020
@Nic-Ma
Copy link
Contributor

Nic-Ma commented Jun 17, 2020

Hi @YuanTingHsieh ,

Thanks for your quick PR.
As we discussed offline, you are right, the traditional method to find connected component is DFS or BFS as your PR, and the skimage lib uses UnionFind method from papers: https://sdm.lbl.gov/~kewu/ps/LBNL-56864.pdf and https://sdm.lbl.gov/~kewu/ps/paa-final.pdf.
I did some analysis to profile the time consuming of your PR:
image
Most of the time is in the red part, which is skipped in the UnionFind method.
So I think maybe we need to change to UnionFind method for better perf, and I found several open source python practices for your reference:
https://github.com/mahdiyousofun/connected-components-lables
https://github.com/danielenricocahall/Connected-Components-Labeling

Do you think it's doable to quickly evaluate it?
If too much effort needed, let's discuss it.

Thanks.

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Jun 22, 2020

Hi @YuanTingHsieh ,

I tried to adjust the program in https://github.com/mahdiyousofun/connected-components-lables and run locally. The perf is still much faster than BFS but still slower than skimage:
Tested 3 times on 512 x 512 binary image:

skimage:  0.00779271125793457s
UnionFind:  0.4875638484954834s
BFS:  7.813398122787476s

skimage:  0.007299900054931641s
UnionFind:  0.43839311599731445s
BFS:  7.3031721115112305s

skimage:  0.006865262985229492s
UnionFind:  0.4505500793457031s
BFS:  16.625162839889526s

I think maybe if we want to improve perf, we need to implement in C++, just like skimage.
So let's keep this ticket open and remove from next release.

here is my test program:

import math, random
from itertools import product
from .ufarray import *
import numpy as np


class UFarray:
    def __init__(self):
        # Array which holds label -> set equivalences
        self.P = []

        # Name of the next label, when one is created
        self.label = 0

    def makeLabel(self):
        r = self.label
        self.label += 1
        self.P.append(r)
        return r
    
    # Makes all nodes "in the path of node i" point to root
    def setRoot(self, i, root):
        while self.P[i] < i:
            j = self.P[i]
            self.P[i] = root
            i = j
        self.P[i] = root

    # Finds the root node of the tree containing node i
    def findRoot(self, i):
        while self.P[i] < i:
            i = self.P[i]
        return i
    
    # Finds the root of the tree containing node i
    # Simultaneously compresses the tree
    def find(self, i):
        root = self.findRoot(i)
        self.setRoot(i, root)
        return root
    
    # Joins the two trees containing nodes i and j
    # Modified to be less agressive about compressing paths
    # because performance was suffering some from over-compression
    def union(self, i, j):
        if i != j:
            root = self.findRoot(i)
            rootj = self.findRoot(j)
            if root > rootj: root = rootj
            self.setRoot(j, root)
            self.setRoot(i, root)
    
    def flatten(self):
        for i in range(1, len(self.P)):
            self.P[i] = self.P[self.P[i]]
    
    def flattenL(self):
        k = 1
        for i in range(1, len(self.P)):
            if self.P[i] < i:
                self.P[i] = self.P[self.P[i]]
            else:
                self.P[i] = k
                k += 1


def label_connected_regions_2d(data):
    height, width = data.shape
 
    # Union find data structure
    uf = UFarray()
 
    #
    # First pass
    #
 
    # Dictionary of point:label pairs
    labels = {}
 
    for x, y in product(range(height), range(width)):
 
        #
        # Pixel names were chosen as shown:
        #
        #   -------------
        #   | a | b | c |
        #   -------------
        #   | d | e |   |
        #   -------------
        #   |   |   |   |
        #   -------------
        #
        # The current pixel is e
        # a, b, c, and d are its neighbors of interest
        #
        # 255 is white, 0 is black
        # White pixels part of the background, so they are ignored
        # If a pixel lies outside the bounds of the image, it default to white
        #
 
        # If the current pixel is white, it's obviously not a component...
        if data[x, y] == 0:
            pass
 
        # If pixel b is in the image and black:
        #    a, d, and c are its neighbors, so they are all part of the same component
        #    Therefore, there is no reason to check their labels
        #    so simply assign b's label to e
        elif x > 0 and data[x-1, y] == 1:
            labels[x, y] = labels[x-1, y]
 
        # If pixel c is in the image and black:
        #    b is its neighbor, but a and d are not
        #    Therefore, we must check a and d's labels
        elif y+1 < width and x > 0 and data[x-1, y+1] == 1:
 
            c = labels[x-1, y+1]
            labels[x, y] = c
 
            # If pixel a is in the image and black:
            #    Then a and c are connected through e
            #    Therefore, we must union their sets
            if y > 0 and data[x-1, y-1] == 1:
                a = labels[x-1, y-1]
                uf.union(c, a)
 
            # If pixel d is in the image and black:
            #    Then d and c are connected through e
            #    Therefore we must union their sets
            elif y > 0 and data[x, y-1] == 1:
                d = labels[x, y-1]
                uf.union(c, d)
 
        # If pixel a is in the image and black:
        #    We already know b and c are white
        #    d is a's neighbor, so they already have the same label
        #    So simply assign a's label to e
        elif x > 0 and y > 0 and data[x-1, y-1] == 1:
            labels[x, y] = labels[x-1, y-1]
 
        # If pixel d is in the image and black
        #    We already know a, b, and c are white
        #    so simpy assign d's label to e
        elif y > 0 and data[x, y-1] == 1:
            labels[x, y] = labels[x, y-1]
 
        # All the neighboring pixels are white,
        # Therefore the current pixel is a new component
        else: 
            labels[x, y] = uf.makeLabel()
 
    #
    # Second pass
    #
 
    uf.flatten()

    output = np.zeros_like(data)
    color = {}
    counter = 0
    for (x, y) in labels:
 
        # Name of the component the current point belongs to
        component = uf.find(labels[x, y])

        # Update the labels with correct information
        labels[x, y] = component
        if component not in color:
            counter += 1
            color[component] = counter 

        # Colorize the image
        output[x, y] = color[component]

    return output

Thanks.

@YuanTingHsieh
Copy link
Collaborator Author

Hi Nic,

Thanks for trying 2D.
In that case is it really necessary to implement the same algorithm ourselves in C++?
If I am going to implement this in C++ I will still follow the logic in skimage.
And note that the implementation of skimage is restricted to 1, 2 or 3D.
https://github.com/scikit-image/scikit-image/blob/v0.17.2/skimage/measure/_ccomp.pyx#L90

So maybe we should add something in the docstring.

Thanks

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Jun 22, 2020

Hi @YuanTingHsieh ,

Yes, I think it's necessary to implement it in C++ to improve the perf. But let's hold on this task for a while because we are still discussing how to add the C++ or CUDA code in MONAI(need to make the code) and the CI for them.

Thanks.

@YuanTingHsieh YuanTingHsieh deleted the 554-remove-skimage-in-get-largest-connected-component-mask branch November 19, 2020 22:57
@YuanTingHsieh YuanTingHsieh restored the 554-remove-skimage-in-get-largest-connected-component-mask branch November 19, 2020 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove skimage dependency in get_largest_connected_component_mask
2 participants